Chernoff Bound
   HOME

TheInfoList



OR:

In
probability theory Probability theory is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expressing it through a set o ...
, the Chernoff bound gives exponentially decreasing bounds on tail distributions of sums of independent random variables. Despite being named after
Herman Chernoff Herman Chernoff (born July 1, 1923) is an American applied mathematician, statistician and physicist. He was formerly a professor at University of Illinois Urbana-Champaign, Stanford, and MIT, currently emeritus at Harvard University. Early l ...
, the author of the paper it first appeared in, the result is due to Herman Rubin. It is a sharper bound than the first- or second-moment-based tail bounds such as
Markov's inequality In probability theory, Markov's inequality gives an upper bound for the probability that a non-negative function (mathematics), function of a random variable is greater than or equal to some positive Constant (mathematics), constant. It is named a ...
or
Chebyshev's inequality In probability theory, Chebyshev's inequality (also called the Bienaymé–Chebyshev inequality) guarantees that, for a wide class of probability distributions, no more than a certain fraction of values can be more than a certain distance from th ...
, which only yield power-law bounds on tail decay. However, the Chernoff bound requires the variates to be independent, a condition that is not required by either Markov's inequality or Chebyshev's inequality (although Chebyshev's inequality does require the variates to be pairwise independent). The Chernoff bound is related to the Bernstein inequalities, which were developed earlier, and to
Hoeffding's inequality In probability theory, Hoeffding's inequality provides an upper bound on the probability that the sum of bounded independent random variables deviates from its expected value by more than a certain amount. Hoeffding's inequality was proven by Wassi ...
.


The generic bound

The generic Chernoff bound for a random variable is attained by applying
Markov's inequality In probability theory, Markov's inequality gives an upper bound for the probability that a non-negative function (mathematics), function of a random variable is greater than or equal to some positive Constant (mathematics), constant. It is named a ...
to . This gives a bound in terms of the
moment-generating function In probability theory and statistics, the moment-generating function of a real-valued random variable is an alternative specification of its probability distribution. Thus, it provides the basis of an alternative route to analytical results compare ...
of . For every t > 0: :\Pr(X \geq a) = \Pr(e^ \geq e^) \leq \frac. Since this bound holds for every positive t, we have: :\Pr(X \geq a) \leq \inf_ \frac. The Chernoff bound sometimes refers to the above inequality, which was first applied by
Sergei Bernstein Sergei Natanovich Bernstein (russian: Серге́й Ната́нович Бернште́йн, sometimes Romanized as ; 5 March 1880 – 26 October 1968) was a Ukrainian and Russian mathematician of Jewish origin known for contributions to parti ...
to prove the related Bernstein inequalities. It is also used to prove
Hoeffding's inequality In probability theory, Hoeffding's inequality provides an upper bound on the probability that the sum of bounded independent random variables deviates from its expected value by more than a certain amount. Hoeffding's inequality was proven by Wassi ...
, Bennett's inequality, and McDiarmid's inequality. This inequality can be applied generally to various classes of distributions, including sub-gaussian distributions, sub-
gamma distribution In probability theory and statistics, the gamma distribution is a two-parameter family of continuous probability distributions. The exponential distribution, Erlang distribution, and chi-square distribution are special cases of the gamma distri ...
s, and sums of independent random variables. Chernoff bounds commonly refer to the case where X is the sum of independent
Bernoulli random variable In probability theory and statistics, the Bernoulli distribution, named after Swiss mathematician Jacob Bernoulli,James Victor Uspensky: ''Introduction to Mathematical Probability'', McGraw-Hill, New York 1937, page 45 is the discrete probabili ...
s. When is the sum of independent random variables , the moment generating function of is the product of the individual moment generating functions, giving that By performing the same analysis on the random variable , one can get the same bound in the other direction. : \Pr (X \leq a) \leq \inf_ e^ \prod_i \operatorname E \left ^ \right /math> Specific Chernoff bounds are attained by calculating the moment-generating function \operatorname E \left ^ \right /math> for specific instances of the random variables X_i. The bounds in the following sections for
Bernoulli random variable In probability theory and statistics, the Bernoulli distribution, named after Swiss mathematician Jacob Bernoulli,James Victor Uspensky: ''Introduction to Mathematical Probability'', McGraw-Hill, New York 1937, page 45 is the discrete probabili ...
s are derived by using that, for a Bernoulli random variable X_i with probability ''p'' of being equal to 1, :\operatorname E \left ^ \right= (1 - p) e^0 + p e^t = 1 + p (e^t -1) \leq e^. One can encounter many flavors of Chernoff bounds: the original ''additive form'' (which gives a bound on the
absolute error The approximation error in a data value is the discrepancy between an exact value and some ''approximation'' to it. This error can be expressed as an absolute error (the numerical amount of the discrepancy) or as a relative error (the absolute er ...
) or the more practical ''multiplicative form'' (which bounds the error relative to the mean).


Multiplicative form (relative error)

Multiplicative Chernoff bound. Suppose are
independent Independent or Independents may refer to: Arts, entertainment, and media Artist groups * Independents (artist group), a group of modernist painters based in the New Hope, Pennsylvania, area of the United States during the early 1930s * Independ ...
random variables taking values in Let denote their sum and let denote the sum's expected value. Then for any , :\Pr ( X > (1+\delta)\mu) < \left(\frac\right)^\mu. A similar proof strategy can be used to show that :\Pr(X < (1-\delta)\mu) < \left(\frac\right)^\mu. The above formula is often unwieldy in practice, so the following looser but more convenient bounds are often used, which follow from the inequality \textstyle\frac \le \log(1+\delta) from the list of logarithmic inequalities: :\Pr( X \le (1-\delta)\mu) \le e^, \qquad 0 \le \delta, :\Pr( X \ge (1+\delta)\mu)\le e^, \qquad 0 \le \delta, :\Pr( , X - \mu, \ge \delta\mu) \le 2e^, \qquad 0 \le \delta \le 1. Notice that the bounds are trivial for \delta = 0.


Additive form (absolute error)

The following theorem is due to
Wassily Hoeffding Wassily Hoeffding (June 12, 1914 – February 28, 1991) was a Finnish statistician and probabilist. Hoeffding was one of the founders of nonparametric statistics, in which Hoeffding contributed the idea and basic results on U-statistics. In pro ...
and hence is called the Chernoff–Hoeffding theorem. :Chernoff–Hoeffding theorem. Suppose are
i.i.d. In probability theory and statistics, a collection of random variables is independent and identically distributed if each random variable has the same probability distribution as the others and all are mutually independent. This property is us ...
random variables, taking values in Let and . ::\begin \Pr \left (\frac \sum X_i \geq p + \varepsilon \right ) \leq \left (\left (\frac\right )^ ^\right )^n &= e^ \\ \Pr \left (\frac \sum X_i \leq p - \varepsilon \right ) \leq \left (\left (\frac\right )^ ^\right )^n &= e^ \end :where :: D(x\parallel y) = x \ln \frac + (1-x) \ln \left (\frac \right ) :is the
Kullback–Leibler divergence In mathematical statistics, the Kullback–Leibler divergence (also called relative entropy and I-divergence), denoted D_\text(P \parallel Q), is a type of statistical distance: a measure of how one probability distribution ''P'' is different fro ...
between Bernoulli distributed random variables with parameters ''x'' and ''y'' respectively. If then D(p+\varepsilon\parallel p)\ge \tfrac which means :: \Pr\left ( \frac\sum X_i>p+x \right ) \leq \exp \left (-\frac \right ). A simpler bound follows by relaxing the theorem using , which follows from the
convexity Convex or convexity may refer to: Science and technology * Convex lens, in optics Mathematics * Convex set, containing the whole line segment that joins points ** Convex polygon, a polygon which encloses a convex set of points ** Convex polytope ...
of and the fact that :\frac D(p+\varepsilon\parallel p) = \frac \geq 4 =\frac(2\varepsilon^2). This result is a special case of
Hoeffding's inequality In probability theory, Hoeffding's inequality provides an upper bound on the probability that the sum of bounded independent random variables deviates from its expected value by more than a certain amount. Hoeffding's inequality was proven by Wassi ...
. Sometimes, the bounds : \begin D( (1+x) p \parallel p) \geq \frac x^2 p, & & & \leq x \leq \tfrac,\\ ptD(x \parallel y) \geq \frac, \\ ptD(x \parallel y) \geq \frac, & & & x \leq y,\\ ptD(x \parallel y) \geq \frac, & & & x \geq y \end which are stronger for are also used.


Sums of independent bounded random variables

Chernoff bounds may also be applied to general sums of independent, bounded random variables, regardless of their distribution; this is known as
Hoeffding's inequality In probability theory, Hoeffding's inequality provides an upper bound on the probability that the sum of bounded independent random variables deviates from its expected value by more than a certain amount. Hoeffding's inequality was proven by Wassi ...
. The proof follows a similar approach to the other Chernoff bounds, but applying Hoeffding's lemma to bound the moment generating functions (see
Hoeffding's inequality In probability theory, Hoeffding's inequality provides an upper bound on the probability that the sum of bounded independent random variables deviates from its expected value by more than a certain amount. Hoeffding's inequality was proven by Wassi ...
). :
Hoeffding's inequality In probability theory, Hoeffding's inequality provides an upper bound on the probability that the sum of bounded independent random variables deviates from its expected value by more than a certain amount. Hoeffding's inequality was proven by Wassi ...
. Suppose are
independent Independent or Independents may refer to: Arts, entertainment, and media Artist groups * Independents (artist group), a group of modernist painters based in the New Hope, Pennsylvania, area of the United States during the early 1930s * Independ ...
random variables taking values in Let denote their sum and let denote the sum's expected value. Then for any , ::\Pr (X \le (1-\delta)\mu) < e^, ::\Pr (X \ge (1+\delta)\mu) < e^.


Applications

Chernoff bounds have very useful applications in set balancing and
packet Packet may refer to: * A small container or pouch ** Packet (container), a small single use container ** Cigarette packet ** Sugar packet * Network packet, a formatted unit of data carried by a packet-mode computer network * Packet radio, a form ...
routing Routing is the process of selecting a path for traffic in a network or between or across multiple networks. Broadly, routing is performed in many types of networks, including circuit-switched networks, such as the public switched telephone netw ...
in sparse networks. The set balancing problem arises while designing statistical experiments. Typically while designing a statistical experiment, given the features of each participant in the experiment, we need to know how to divide the participants into 2 disjoint groups such that each feature is roughly as balanced as possible between the two groups.Refer to thi
book section
for more info on the problem.
Chernoff bounds are also used to obtain tight bounds for permutation routing problems which reduce
network congestion Network congestion in data networking and queueing theory is the reduced quality of service that occurs when a network node or link is carrying more data than it can handle. Typical effects include queueing delay, packet loss or the blocking of ...
while routing packets in sparse networks. Chernoff bounds are used in
computational learning theory In computer science, computational learning theory (or just learning theory) is a subfield of artificial intelligence devoted to studying the design and analysis of machine learning algorithms. Overview Theoretical results in machine learning m ...
to prove that a learning algorithm is
probably approximately correct In computational learning theory, probably approximately correct (PAC) learning is a framework for mathematical analysis of machine learning. It was proposed in 1984 by Leslie Valiant.L. Valiant. A theory of the learnable.' Communications of the A ...
, i.e. with high probability the algorithm has small error on a sufficiently large training data set. Chernoff bounds can be effectively used to evaluate the "robustness level" of an application/algorithm by exploring its perturbation space with randomization. The use of the Chernoff bound permits one to abandon the strong—and mostly unrealistic—small perturbation hypothesis (the perturbation magnitude is small). The robustness level can be, in turn, used either to validate or reject a specific algorithmic choice, a hardware implementation or the appropriateness of a solution whose structural parameters are affected by uncertainties. A simple and common use of Chernoff bounds is for "boosting" of
randomized algorithm A randomized algorithm is an algorithm that employs a degree of randomness as part of its logic or procedure. The algorithm typically uses uniformly random bits as an auxiliary input to guide its behavior, in the hope of achieving good performan ...
s. If one has an algorithm that outputs a guess that is the desired answer with probability ''p'' > 1/2, then one can get a higher success rate by running the algorithm n = \log(1/\delta) 2p/(p - 1/2)^2 times and outputting a guess that is output by more than ''n''/2 runs of the algorithm. (There cannot be more than one such guess by the
pigeonhole principle In mathematics, the pigeonhole principle states that if items are put into containers, with , then at least one container must contain more than one item. For example, if one has three gloves (and none is ambidextrous/reversible), then there mu ...
.) Assuming that these algorithm runs are independent, the probability that more than ''n''/2 of the guesses is correct is equal to the probability that the sum of independent Bernoulli random variables that are 1 with probability ''p'' is more than ''n''/2. This can be shown to be at least 1-\delta via the multiplicative Chernoff bound (Corollary 13.3 in Sinclair's class notes, ).: :\Pr\left > \right\ge 1 - e^ \geq 1-\delta


Matrix Chernoff bound

Rudolf Ahlswede Rudolf F. Ahlswede (15 September 1938 – 18 December 2010) was a German mathematician. Born in Dielmissen, Germany, he studied mathematics, physics, and philosophy. He wrote his Ph.D. thesis in 1966, at the University of Göttingen, with the ...
and
Andreas Winter Andreas J. Winter (born 14 June 1971, Mühldorf, Germany) is a German mathematician and mathematical physicist at the Universitat Autònoma de Barcelona (UAB) in Spain. He received his Ph.D. in 1999 under Rudolf Ahlswede and Friedrich Götze at th ...
introduced a Chernoff bound for matrix-valued random variables. The following version of the inequality can be found in the work of Tropp. Let be independent matrix valued random variables such that M_i\in \mathbb^ and \mathbb _i0. Let us denote by \lVert M \rVert the operator norm of the matrix M . If \lVert M_i \rVert \leq \gamma holds almost surely for all i\in\ , then for every :\Pr\left( \left\, \frac \sum_^t M_i \right\, > \varepsilon \right) \leq (d_1+d_2) \exp \left( -\frac \right). Notice that in order to conclude that the deviation from 0 is bounded by with high probability, we need to choose a number of samples t proportional to the logarithm of d_1+d_2 . In general, unfortunately, a dependence on \log(\min(d_1,d_2)) is inevitable: take for example a diagonal random sign matrix of dimension d\times d . The operator norm of the sum of ''t'' independent samples is precisely the maximum deviation among ''d'' independent random walks of length ''t''. In order to achieve a fixed bound on the maximum deviation with constant probability, it is easy to see that ''t'' should grow logarithmically with ''d'' in this scenario. The following theorem can be obtained by assuming ''M'' has low rank, in order to avoid the dependency on the dimensions.


Theorem without the dependency on the dimensions

Let and ''M'' be a random symmetric real matrix with \, \operatorname E \, \leq 1 and \, M\, \leq \gamma almost surely. Assume that each element on the support of ''M'' has at most rank ''r''. Set : t = \Omega \left( \frac \right). If r \leq t holds almost surely, then :\Pr\left(\left\, \frac \sum_^t M_i - \operatorname E \right\, > \varepsilon \right) \leq \frac where are i.i.d. copies of ''M''.


Sampling variant

The following variant of Chernoff's bound can be used to bound the probability that a majority in a population will become a minority in a sample, or vice versa. Suppose there is a general population ''A'' and a sub-population ''B'' ⊆ ''A''. Mark the relative size of the sub-population (, ''B'', /, ''A'', ) by ''r''. Suppose we pick an integer ''k'' and a random sample ''S'' ⊂ ''A'' of size ''k''. Mark the relative size of the sub-population in the sample (, ''B''∩''S'', /, ''S'', ) by ''rS''. Then, for every fraction ''d'' ∈  ,1 :\Pr\left(r_S < (1-d)\cdot r\right) < \exp\left(-r\cdot d^2 \cdot \frac k 2\right) In particular, if ''B'' is a majority in ''A'' (i.e. ''r'' > 0.5) we can bound the probability that ''B'' will remain majority in ''S''(''rS'' > 0.5) by taking: ''d'' = 1 − 1/(2''r''):See graphs of
the bound as a function of ''r'' when ''k'' changes
an
the bound as a function of ''k'' when ''r'' changes
:\Pr\left(r_S > 0.5\right) > 1 - \exp\left(-r\cdot \left(1 - \frac\right)^2 \cdot \frac k 2 \right) This bound is of course not tight at all. For example, when ''r'' = 0.5 we get a trivial bound Prob > 0.


Proofs


Multiplicative form

Following the conditions of the multiplicative Chernoff bound, let be independent
Bernoulli random variable In probability theory and statistics, the Bernoulli distribution, named after Swiss mathematician Jacob Bernoulli,James Victor Uspensky: ''Introduction to Mathematical Probability'', McGraw-Hill, New York 1937, page 45 is the discrete probabili ...
s, whose sum is , each having probability ''pi'' of being equal to 1. For a Bernoulli variable: :\operatorname E \left ^ \right= (1 - p_i) e^0 + p_i e^t = 1 + p_i (e^t -1) \leq e^ So, using () with a = (1+\delta)\mu for any \delta>0 and where \mu = \operatorname E = \textstyle\sum_^n p_i, :\begin \Pr (X > (1 + \delta)\mu) &\le \inf_ \exp(-t(1+\delta)\mu)\prod_^n\operatorname exp(tX_i)\ pt& \leq \inf_ \exp\Big(-t(1+\delta)\mu + \sum_^n p_i(e^t - 1)\Big) \\ pt& = \inf_ \exp\Big(-t(1+\delta)\mu + (e^t - 1)\mu\Big). \end If we simply set so that for , we can substitute and find :\exp\Big(-t(1+\delta)\mu + (e^t - 1)\mu\Big) = \frac = \left frac\right\mu. This proves the result desired.


Chernoff–Hoeffding theorem (additive form)

Let . Taking in (), we obtain: :\Pr\left ( \frac \sum X_i \ge q\right )\le \inf_ \frac = \inf_ \left ( \frac\right )^n. Now, knowing that , we have :\left (\frac\right )^n = \left (\frac\right )^n = \left ( pe^ + (1-p)e^ \right )^n. Therefore, we can easily compute the infimum, using calculus: :\frac \left (pe^ + (1-p)e^ \right) = (1-q)pe^-q(1-p)e^ Setting the equation to zero and solving, we have :\begin (1-q)pe^ &= q(1-p)e^ \\ (1-q)pe^ &= q(1-p) \end so that :e^t = \frac. Thus, :t = \log\left(\frac\right). As , we see that , so our bound is satisfied on . Having solved for , we can plug back into the equations above to find that :\begin \log \left (pe^ + (1-p)e^ \right ) &= \log \left ( e^(1-p+pe^t) \right ) \\ &= \log\left (e^\right) + \log\left(1-p+pe^e^\right ) \\ &= -q\log\frac -q \log\frac + \log\left(1-p+ p\left(\frac\right)\frac\right) \\ &= -q\log\frac -q \log\frac + \log\left(\frac+\frac\right) \\ &= -q \log\frac + \left ( -q\log\frac + \log\frac \right ) \\ &= -q\log\frac + (1-q)\log\frac \\ &= -D(q \parallel p). \end We now have our desired result, that :\Pr \left (\tfrac\sum X_i \ge p + \varepsilon\right ) \le e^. To complete the proof for the symmetric case, we simply define the random variable , apply the same proof, and plug it into our bound.


See also

* Bernstein inequalities *
Concentration inequality In probability theory, concentration inequalities provide bounds on how a random variable deviates from some value (typically, its expected value). The law of large numbers of classical probability theory states that sums of independent random vari ...
− a summary of tail-bounds on random variables. * Cramér's theorem *
Entropic value at risk In financial mathematics and stochastic optimization, the concept of risk measure is used to quantify the risk involved in a random outcome or risk position. Many risk measures have hitherto been proposed, each having certain characteristics. The en ...
*
Hoeffding's inequality In probability theory, Hoeffding's inequality provides an upper bound on the probability that the sum of bounded independent random variables deviates from its expected value by more than a certain amount. Hoeffding's inequality was proven by Wassi ...
* Matrix Chernoff bound *
Moment generating function In probability theory and statistics, the moment-generating function of a real-valued random variable is an alternative specification of its probability distribution. Thus, it provides the basis of an alternative route to analytical results compare ...


References


Further reading

* * * * {{DEFAULTSORT:Chernoff Bound Probabilistic inequalities